Dataset statistics
| Number of variables | 17 |
|---|---|
| Number of observations | 9471 |
| Missing cells | 20652 |
| Missing cells (%) | 12.8% |
| Duplicate rows | 113 |
| Duplicate rows (%) | 1.2% |
| Total size in memory | 1.2 MiB |
| Average record size in memory | 136.0 B |
Variable types
| NUM | 13 |
|---|---|
| UNSUPPORTED | 2 |
| CAT | 2 |
| Dataset has 113 (1.2%) duplicate rows | Duplicates |
Date has a high cardinality: 391 distinct values | High cardinality |
PT08.S2(NMHC) is highly correlated with PT08.S1(CO) and 1 other fields | High correlation |
PT08.S1(CO) is highly correlated with PT08.S2(NMHC) | High correlation |
PT08.S5(O3) is highly correlated with PT08.S2(NMHC) | High correlation |
T is highly correlated with C6H6(GT) and 1 other fields | High correlation |
C6H6(GT) is highly correlated with T and 2 other fields | High correlation |
RH is highly correlated with C6H6(GT) and 1 other fields | High correlation |
AH is highly correlated with C6H6(GT) and 2 other fields | High correlation |
Date has 114 (1.2%) missing values | Missing |
Time has 114 (1.2%) missing values | Missing |
CO(GT) has 114 (1.2%) missing values | Missing |
PT08.S1(CO) has 114 (1.2%) missing values | Missing |
NMHC(GT) has 114 (1.2%) missing values | Missing |
C6H6(GT) has 114 (1.2%) missing values | Missing |
PT08.S2(NMHC) has 114 (1.2%) missing values | Missing |
NOx(GT) has 114 (1.2%) missing values | Missing |
PT08.S3(NOx) has 114 (1.2%) missing values | Missing |
NO2(GT) has 114 (1.2%) missing values | Missing |
PT08.S4(NO2) has 114 (1.2%) missing values | Missing |
PT08.S5(O3) has 114 (1.2%) missing values | Missing |
T has 114 (1.2%) missing values | Missing |
RH has 114 (1.2%) missing values | Missing |
AH has 114 (1.2%) missing values | Missing |
Unnamed: 15 has 9471 (100.0%) missing values | Missing |
Unnamed: 16 has 9471 (100.0%) missing values | Missing |
Date is uniformly distributed | Uniform |
Time is uniformly distributed | Uniform |
Unnamed: 15 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Unnamed: 16 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Reproduction
| Analysis started | 2020-11-19 01:40:56.597890 |
|---|---|
| Analysis finished | 2020-11-19 01:41:29.693271 |
| Duration | 33.1 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 391 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Memory size | 74.0 KiB |
| 2005/1/4 | 24 |
|---|---|
| 2004/7/20 | 24 |
| 2004/6/19 | 24 |
| 2005/1/7 | 24 |
| 2004/6/17 | 24 |
| Other values (386) |
| Value | Count | Frequency (%) | |
| 2005/1/4 | 24 | 0.3% | |
| 2004/7/20 | 24 | 0.3% | |
| 2004/6/19 | 24 | 0.3% | |
| 2005/1/7 | 24 | 0.3% | |
| 2004/6/17 | 24 | 0.3% | |
| 2004/3/27 | 24 | 0.3% | |
| 2005/3/30 | 24 | 0.3% | |
| 2005/2/4 | 24 | 0.3% | |
| 2004/3/18 | 24 | 0.3% | |
| 2004/10/2 | 24 | 0.3% | |
| Other values (381) | 9117 | 96.3% | |
| (Missing) | 114 | 1.2% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 10 |
|---|---|
| Median length | 9 |
| Mean length | 8.87804878 |
| Min length | 3 |
| Distinct | 24 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Memory size | 74.0 KiB |
| 5:00:00 | 390 |
|---|---|
| 23:00:00 | 390 |
| 21:00:00 | 390 |
| 3:00:00 | 390 |
| 11:00:00 | 390 |
| Other values (19) |
| Value | Count | Frequency (%) | |
| 5:00:00 | 390 | 4.1% | |
| 23:00:00 | 390 | 4.1% | |
| 21:00:00 | 390 | 4.1% | |
| 3:00:00 | 390 | 4.1% | |
| 11:00:00 | 390 | 4.1% | |
| 2:00:00 | 390 | 4.1% | |
| 12:00:00 | 390 | 4.1% | |
| 4:00:00 | 390 | 4.1% | |
| 18:00:00 | 390 | 4.1% | |
| 1:00:00 | 390 | 4.1% | |
| Other values (14) | 5457 | 57.6% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 7.528032943 |
| Min length | 3 |
| Distinct | 97 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -34.20752378 |
|---|---|
| Minimum | -200 |
| Maximum | 11.9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 74.0 KiB |
Quantile statistics
| Minimum | -200 |
|---|---|
| 5-th percentile | -200 |
| Q1 | 0.6 |
| median | 1.5 |
| Q3 | 2.6 |
| 95-th percentile | 4.7 |
| Maximum | 11.9 |
| Range | 211.9 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 77.65717035 |
|---|---|
| Coefficient of variation (CV) | -2.270178071 |
| Kurtosis | 0.7783055185 |
| Mean | -34.20752378 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -1.666179502 |
| Sum | -320079.8 |
| Variance | 6030.636106 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| -200 | 1683 | 17.8% | |
| 1 | 305 | 3.2% | |
| 1.4 | 279 | 2.9% | |
| 1.6 | 275 | 2.9% | |
| 1.5 | 273 | 2.9% | |
| 1.1 | 262 | 2.8% | |
| 0.7 | 260 | 2.7% | |
| 1.7 | 258 | 2.7% | |
| 1.3 | 253 | 2.7% | |
| 0.8 | 251 | 2.7% | |
| Other values (87) | 5258 | 55.5% |
| Value | Count | Frequency (%) | |
| -200 | 1683 | 17.8% | |
| 0.1 | 33 | 0.3% | |
| 0.2 | 45 | 0.5% | |
| 0.3 | 98 | 1.0% | |
| 0.4 | 160 | 1.7% |
| Value | Count | Frequency (%) | |
| 11.9 | 1 | < 0.1% | |
| 11.5 | 1 | < 0.1% | |
| 10.2 | 2 | < 0.1% | |
| 10.1 | 1 | < 0.1% | |
| 9.9 | 1 | < 0.1% |
| Distinct | 1042 |
|---|---|
| Distinct (%) | 11.1% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1048.990061 |
|---|---|
| Minimum | -200 |
| Maximum | 2040 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 74.0 KiB |
Quantile statistics
| Minimum | -200 |
|---|---|
| 5-th percentile | 746 |
| Q1 | 921 |
| median | 1053 |
| Q3 | 1221 |
| 95-th percentile | 1502 |
| Maximum | 2040 |
| Range | 2240 |
| Interquartile range (IQR) | 300 |
Descriptive statistics
| Standard deviation | 329.8327099 |
|---|---|
| Coefficient of variation (CV) | 0.3144288227 |
| Kurtosis | 5.836935683 |
| Mean | 1048.990061 |
| Median Absolute Deviation (MAD) | 147 |
| Skewness | -1.721503448 |
| Sum | 9815400 |
| Variance | 108789.6165 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 973 | 30 | 0.3% | |
| 1100 | 28 | 0.3% | |
| 969 | 26 | 0.3% | |
| 938 | 26 | 0.3% | |
| 988 | 26 | 0.3% | |
| 925 | 26 | 0.3% | |
| 970 | 25 | 0.3% | |
| 987 | 25 | 0.3% | |
| 984 | 25 | 0.3% | |
| Other values (1032) | 8754 | 92.4% | |
| (Missing) | 114 | 1.2% |
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 647 | 1 | < 0.1% | |
| 649 | 1 | < 0.1% | |
| 655 | 1 | < 0.1% | |
| 667 | 3 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2040 | 1 | < 0.1% | |
| 2008 | 1 | < 0.1% | |
| 1982 | 1 | < 0.1% | |
| 1975 | 1 | < 0.1% | |
| 1973 | 1 | < 0.1% |
| Distinct | 430 |
|---|---|
| Distinct (%) | 4.6% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -159.090093 |
|---|---|
| Minimum | -200 |
| Maximum | 1189 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 74.0 KiB |
Quantile statistics
| Minimum | -200 |
|---|---|
| 5-th percentile | -200 |
| Q1 | -200 |
| median | -200 |
| Q3 | -200 |
| 95-th percentile | 144.2 |
| Maximum | 1189 |
| Range | 1389 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 139.7890929 |
|---|---|
| Coefficient of variation (CV) | -0.8786788057 |
| Kurtosis | 18.86382399 |
| Mean | -159.090093 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.075784452 |
| Sum | -1488606 |
| Variance | 19540.99049 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| -200 | 8443 | 89.1% | |
| 66 | 14 | 0.1% | |
| 29 | 9 | 0.1% | |
| 40 | 9 | 0.1% | |
| 88 | 8 | 0.1% | |
| 93 | 8 | 0.1% | |
| 57 | 7 | 0.1% | |
| 55 | 7 | 0.1% | |
| 95 | 7 | 0.1% | |
| 84 | 7 | 0.1% | |
| Other values (420) | 838 | 8.8% | |
| (Missing) | 114 | 1.2% |
| Value | Count | Frequency (%) | |
| -200 | 8443 | 89.1% | |
| 7 | 1 | < 0.1% | |
| 8 | 1 | < 0.1% | |
| 9 | 1 | < 0.1% | |
| 10 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1189 | 1 | < 0.1% | |
| 1129 | 1 | < 0.1% | |
| 1084 | 1 | < 0.1% | |
| 1042 | 1 | < 0.1% | |
| 974 | 1 | < 0.1% |
| Distinct | 408 |
|---|---|
| Distinct (%) | 4.4% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.865683446 |
|---|---|
| Minimum | -200 |
| Maximum | 63.7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 74.0 KiB |
Quantile statistics
| Minimum | -200 |
|---|---|
| 5-th percentile | 0.7 |
| Q1 | 4 |
| median | 7.9 |
| Q3 | 13.6 |
| 95-th percentile | 24.42 |
| Maximum | 63.7 |
| Range | 263.7 |
| Interquartile range (IQR) | 9.6 |
Descriptive statistics
| Standard deviation | 41.38020644 |
|---|---|
| Coefficient of variation (CV) | 22.17965032 |
| Kurtosis | 19.18865057 |
| Mean | 1.865683446 |
| Median Absolute Deviation (MAD) | 4.5 |
| Skewness | -4.508762883 |
| Sum | 17457.2 |
| Variance | 1712.321485 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 3.6 | 84 | 0.9% | |
| 2.8 | 82 | 0.9% | |
| 3.8 | 79 | 0.8% | |
| 4 | 78 | 0.8% | |
| 3.1 | 77 | 0.8% | |
| 3 | 76 | 0.8% | |
| 2.5 | 75 | 0.8% | |
| 2.9 | 73 | 0.8% | |
| 5.4 | 72 | 0.8% | |
| Other values (398) | 8295 | 87.6% | |
| (Missing) | 114 | 1.2% |
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 0.1 | 2 | < 0.1% | |
| 0.2 | 8 | 0.1% | |
| 0.3 | 10 | 0.1% | |
| 0.4 | 14 | 0.1% |
| Value | Count | Frequency (%) | |
| 63.7 | 1 | < 0.1% | |
| 52.1 | 1 | < 0.1% | |
| 50.8 | 1 | < 0.1% | |
| 50.7 | 1 | < 0.1% | |
| 50.6 | 1 | < 0.1% |
| Distinct | 1246 |
|---|---|
| Distinct (%) | 13.3% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 894.5952763 |
|---|---|
| Minimum | -200 |
| Maximum | 2214 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 74.0 KiB |
Quantile statistics
| Minimum | -200 |
|---|---|
| 5-th percentile | 471 |
| Q1 | 711 |
| median | 895 |
| Q3 | 1105 |
| 95-th percentile | 1415 |
| Maximum | 2214 |
| Range | 2414 |
| Interquartile range (IQR) | 394 |
Descriptive statistics
| Standard deviation | 342.3332516 |
|---|---|
| Coefficient of variation (CV) | 0.3826682979 |
| Kurtosis | 2.370088799 |
| Mean | 894.5952763 |
| Median Absolute Deviation (MAD) | 195 |
| Skewness | -0.7934346434 |
| Sum | 8370728 |
| Variance | 117192.0552 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 853 | 25 | 0.3% | |
| 880 | 23 | 0.2% | |
| 800 | 23 | 0.2% | |
| 859 | 23 | 0.2% | |
| 985 | 22 | 0.2% | |
| 850 | 21 | 0.2% | |
| 783 | 21 | 0.2% | |
| 769 | 21 | 0.2% | |
| 776 | 21 | 0.2% | |
| Other values (1236) | 8791 | 92.8% | |
| (Missing) | 114 | 1.2% |
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 383 | 2 | < 0.1% | |
| 387 | 1 | < 0.1% | |
| 388 | 1 | < 0.1% | |
| 390 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2214 | 1 | < 0.1% | |
| 2007 | 1 | < 0.1% | |
| 1983 | 1 | < 0.1% | |
| 1981 | 1 | < 0.1% | |
| 1980 | 1 | < 0.1% |
| Distinct | 926 |
|---|---|
| Distinct (%) | 9.9% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 168.6169713 |
|---|---|
| Minimum | -200 |
| Maximum | 1479 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 74.0 KiB |
Quantile statistics
| Minimum | -200 |
|---|---|
| 5-th percentile | -200 |
| Q1 | 50 |
| median | 141 |
| Q3 | 284 |
| 95-th percentile | 653.2 |
| Maximum | 1479 |
| Range | 1679 |
| Interquartile range (IQR) | 234 |
Descriptive statistics
| Standard deviation | 257.4338663 |
|---|---|
| Coefficient of variation (CV) | 1.526737578 |
| Kurtosis | 1.505417097 |
| Mean | 168.6169713 |
| Median Absolute Deviation (MAD) | 109 |
| Skewness | 0.8252321889 |
| Sum | 1577749 |
| Variance | 66272.19551 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| -200 | 1639 | 17.3% | |
| 89 | 41 | 0.4% | |
| 65 | 37 | 0.4% | |
| 41 | 36 | 0.4% | |
| 122 | 36 | 0.4% | |
| 93 | 36 | 0.4% | |
| 180 | 35 | 0.4% | |
| 132 | 35 | 0.4% | |
| 95 | 35 | 0.4% | |
| 51 | 34 | 0.4% | |
| Other values (916) | 7393 | 78.1% | |
| (Missing) | 114 | 1.2% |
| Value | Count | Frequency (%) | |
| -200 | 1639 | 17.3% | |
| 2 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% | |
| 6 | 1 | < 0.1% | |
| 7 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1479 | 1 | < 0.1% | |
| 1389 | 2 | < 0.1% | |
| 1369 | 1 | < 0.1% | |
| 1358 | 1 | < 0.1% | |
| 1345 | 1 | < 0.1% |
| Distinct | 1222 |
|---|---|
| Distinct (%) | 13.1% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 794.9901678 |
|---|---|
| Minimum | -200 |
| Maximum | 2683 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 74.0 KiB |
Quantile statistics
| Minimum | -200 |
|---|---|
| 5-th percentile | 410 |
| Q1 | 637 |
| median | 794 |
| Q3 | 960 |
| 95-th percentile | 1281.2 |
| Maximum | 2683 |
| Range | 2883 |
| Interquartile range (IQR) | 323 |
Descriptive statistics
| Standard deviation | 321.9935516 |
|---|---|
| Coefficient of variation (CV) | 0.4050283446 |
| Kurtosis | 3.104825915 |
| Mean | 794.9901678 |
| Median Absolute Deviation (MAD) | 161 |
| Skewness | -0.3847597666 |
| Sum | 7438723 |
| Variance | 103679.8473 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 767 | 25 | 0.3% | |
| 733 | 25 | 0.3% | |
| 846 | 25 | 0.3% | |
| 765 | 23 | 0.2% | |
| 876 | 23 | 0.2% | |
| 845 | 22 | 0.2% | |
| 800 | 22 | 0.2% | |
| 872 | 22 | 0.2% | |
| 816 | 22 | 0.2% | |
| Other values (1212) | 8782 | 92.7% | |
| (Missing) | 114 | 1.2% |
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 322 | 1 | < 0.1% | |
| 325 | 2 | < 0.1% | |
| 328 | 1 | < 0.1% | |
| 330 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2683 | 1 | < 0.1% | |
| 2559 | 1 | < 0.1% | |
| 2542 | 1 | < 0.1% | |
| 2331 | 1 | < 0.1% | |
| 2327 | 1 | < 0.1% |
| Distinct | 284 |
|---|---|
| Distinct (%) | 3.0% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 58.1488725 |
|---|---|
| Minimum | -200 |
| Maximum | 340 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 74.0 KiB |
Quantile statistics
| Minimum | -200 |
|---|---|
| 5-th percentile | -200 |
| Q1 | 53 |
| median | 96 |
| Q3 | 133 |
| 95-th percentile | 194 |
| Maximum | 340 |
| Range | 540 |
| Interquartile range (IQR) | 80 |
Descriptive statistics
| Standard deviation | 126.9404553 |
|---|---|
| Coefficient of variation (CV) | 2.183025221 |
| Kurtosis | 0.2755990718 |
| Mean | 58.1488725 |
| Median Absolute Deviation (MAD) | 40 |
| Skewness | -1.22562964 |
| Sum | 544099 |
| Variance | 16113.87918 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| -200 | 1642 | 17.3% | |
| 97 | 78 | 0.8% | |
| 119 | 77 | 0.8% | |
| 117 | 77 | 0.8% | |
| 114 | 75 | 0.8% | |
| 101 | 75 | 0.8% | |
| 95 | 75 | 0.8% | |
| 110 | 74 | 0.8% | |
| 115 | 73 | 0.8% | |
| 107 | 72 | 0.8% | |
| Other values (274) | 7039 | 74.3% | |
| (Missing) | 114 | 1.2% |
| Value | Count | Frequency (%) | |
| -200 | 1642 | 17.3% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 5 | 2 | < 0.1% | |
| 7 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 340 | 1 | < 0.1% | |
| 333 | 1 | < 0.1% | |
| 326 | 1 | < 0.1% | |
| 322 | 1 | < 0.1% | |
| 312 | 1 | < 0.1% |
| Distinct | 1604 |
|---|---|
| Distinct (%) | 17.1% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1391.479641 |
|---|---|
| Minimum | -200 |
| Maximum | 2775 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 74.0 KiB |
Quantile statistics
| Minimum | -200 |
|---|---|
| 5-th percentile | 757 |
| Q1 | 1185 |
| median | 1446 |
| Q3 | 1662 |
| 95-th percentile | 2020.2 |
| Maximum | 2775 |
| Range | 2975 |
| Interquartile range (IQR) | 477 |
Descriptive statistics
| Standard deviation | 467.2101246 |
|---|---|
| Coefficient of variation (CV) | 0.3357649734 |
| Kurtosis | 3.267027856 |
| Mean | 1391.479641 |
| Median Absolute Deviation (MAD) | 236 |
| Skewness | -1.244109947 |
| Sum | 13020075 |
| Variance | 218285.3005 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 1488 | 24 | 0.3% | |
| 1580 | 22 | 0.2% | |
| 1539 | 21 | 0.2% | |
| 1467 | 20 | 0.2% | |
| 1638 | 19 | 0.2% | |
| 1490 | 18 | 0.2% | |
| 1418 | 18 | 0.2% | |
| 1570 | 17 | 0.2% | |
| 1473 | 17 | 0.2% | |
| Other values (1594) | 8815 | 93.1% | |
| (Missing) | 114 | 1.2% |
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 551 | 1 | < 0.1% | |
| 559 | 1 | < 0.1% | |
| 561 | 1 | < 0.1% | |
| 579 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2775 | 1 | < 0.1% | |
| 2746 | 1 | < 0.1% | |
| 2691 | 1 | < 0.1% | |
| 2684 | 1 | < 0.1% | |
| 2679 | 1 | < 0.1% |
| Distinct | 1744 |
|---|---|
| Distinct (%) | 18.6% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 975.0720316 |
|---|---|
| Minimum | -200 |
| Maximum | 2523 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 74.0 KiB |
Quantile statistics
| Minimum | -200 |
|---|---|
| 5-th percentile | 348 |
| Q1 | 700 |
| median | 942 |
| Q3 | 1255 |
| 95-th percentile | 1750 |
| Maximum | 2523 |
| Range | 2723 |
| Interquartile range (IQR) | 555 |
Descriptive statistics
| Standard deviation | 456.9381845 |
|---|---|
| Coefficient of variation (CV) | 0.4686199272 |
| Kurtosis | 0.6382966399 |
| Mean | 975.0720316 |
| Median Absolute Deviation (MAD) | 272 |
| Skewness | -0.03466187982 |
| Sum | 9123749 |
| Variance | 208792.5044 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 836 | 20 | 0.2% | |
| 825 | 20 | 0.2% | |
| 826 | 19 | 0.2% | |
| 926 | 18 | 0.2% | |
| 799 | 17 | 0.2% | |
| 777 | 17 | 0.2% | |
| 923 | 16 | 0.2% | |
| 905 | 16 | 0.2% | |
| 891 | 16 | 0.2% | |
| Other values (1734) | 8832 | 93.3% | |
| (Missing) | 114 | 1.2% |
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 221 | 1 | < 0.1% | |
| 225 | 1 | < 0.1% | |
| 227 | 1 | < 0.1% | |
| 232 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2523 | 1 | < 0.1% | |
| 2522 | 1 | < 0.1% | |
| 2519 | 1 | < 0.1% | |
| 2515 | 1 | < 0.1% | |
| 2494 | 1 | < 0.1% |
| Distinct | 437 |
|---|---|
| Distinct (%) | 4.7% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.778305012 |
|---|---|
| Minimum | -200 |
| Maximum | 44.6 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 74.0 KiB |
Quantile statistics
| Minimum | -200 |
|---|---|
| 5-th percentile | 2.5 |
| Q1 | 10.9 |
| median | 17.2 |
| Q3 | 24.1 |
| 95-th percentile | 34.3 |
| Maximum | 44.6 |
| Range | 244.6 |
| Interquartile range (IQR) | 13.2 |
Descriptive statistics
| Standard deviation | 43.20362306 |
|---|---|
| Coefficient of variation (CV) | 4.418314116 |
| Kurtosis | 18.77480657 |
| Mean | 9.778305012 |
| Median Absolute Deviation (MAD) | 6.6 |
| Skewness | -4.445467033 |
| Sum | 91495.6 |
| Variance | 1866.553046 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 20.8 | 57 | 0.6% | |
| 21.3 | 54 | 0.6% | |
| 20.2 | 51 | 0.5% | |
| 13.8 | 51 | 0.5% | |
| 12 | 49 | 0.5% | |
| 15.6 | 49 | 0.5% | |
| 12.3 | 49 | 0.5% | |
| 16.3 | 48 | 0.5% | |
| 19.8 | 48 | 0.5% | |
| Other values (427) | 8535 | 90.1% | |
| (Missing) | 114 | 1.2% |
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| -1.9 | 1 | < 0.1% | |
| -1.4 | 1 | < 0.1% | |
| -1.3 | 2 | < 0.1% | |
| -1.2 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 44.6 | 1 | < 0.1% | |
| 44.3 | 1 | < 0.1% | |
| 43.4 | 1 | < 0.1% | |
| 43.1 | 1 | < 0.1% | |
| 42.8 | 3 | < 0.1% |
| Distinct | 754 |
|---|---|
| Distinct (%) | 8.1% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 39.48537993 |
|---|---|
| Minimum | -200 |
| Maximum | 88.7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 74.0 KiB |
Quantile statistics
| Minimum | -200 |
|---|---|
| 5-th percentile | 15 |
| Q1 | 34.1 |
| median | 48.6 |
| Q3 | 61.9 |
| 95-th percentile | 77.6 |
| Maximum | 88.7 |
| Range | 288.7 |
| Interquartile range (IQR) | 27.8 |
Descriptive statistics
| Standard deviation | 51.21614497 |
|---|---|
| Coefficient of variation (CV) | 1.297091355 |
| Kurtosis | 15.76415389 |
| Mean | 39.48537993 |
| Median Absolute Deviation (MAD) | 13.9 |
| Skewness | -3.932407357 |
| Sum | 369464.7 |
| Variance | 2623.093506 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 53.1 | 31 | 0.3% | |
| 57.9 | 30 | 0.3% | |
| 47.8 | 30 | 0.3% | |
| 45.9 | 27 | 0.3% | |
| 60.8 | 27 | 0.3% | |
| 50.1 | 26 | 0.3% | |
| 47.6 | 26 | 0.3% | |
| 50.9 | 26 | 0.3% | |
| 57.6 | 26 | 0.3% | |
| Other values (744) | 8742 | 92.3% | |
| (Missing) | 114 | 1.2% |
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 9.2 | 2 | < 0.1% | |
| 9.3 | 1 | < 0.1% | |
| 9.6 | 1 | < 0.1% | |
| 9.8 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 88.7 | 1 | < 0.1% | |
| 87.2 | 1 | < 0.1% | |
| 87.1 | 1 | < 0.1% | |
| 87 | 1 | < 0.1% | |
| 86.6 | 2 | < 0.1% |
| Distinct | 6684 |
|---|---|
| Distinct (%) | 71.4% |
| Missing | 114 |
| Missing (%) | 1.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -6.837603644 |
|---|---|
| Minimum | -200 |
| Maximum | 2.231 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 74.0 KiB |
Quantile statistics
| Minimum | -200 |
|---|---|
| 5-th percentile | 0.29506 |
| Q1 | 0.6923 |
| median | 0.9768 |
| Q3 | 1.2962 |
| 95-th percentile | 1.72044 |
| Maximum | 2.231 |
| Range | 202.231 |
| Interquartile range (IQR) | 0.6039 |
Descriptive statistics
| Standard deviation | 38.97667017 |
|---|---|
| Coefficient of variation (CV) | -5.700340674 |
| Kurtosis | 20.61309172 |
| Mean | -6.837603644 |
| Median Absolute Deviation (MAD) | 0.3022 |
| Skewness | -4.75457029 |
| Sum | -63979.4573 |
| Variance | 1519.180817 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 1.1199 | 6 | 0.1% | |
| 0.8394 | 6 | 0.1% | |
| 0.9684 | 6 | 0.1% | |
| 0.7487 | 6 | 0.1% | |
| 0.9722 | 6 | 0.1% | |
| 0.8736 | 5 | 0.1% | |
| 0.9271 | 5 | 0.1% | |
| 0.8325 | 5 | 0.1% | |
| 0.6686 | 5 | 0.1% | |
| Other values (6674) | 8941 | 94.4% | |
| (Missing) | 114 | 1.2% |
| Value | Count | Frequency (%) | |
| -200 | 366 | 3.9% | |
| 0.1847 | 1 | < 0.1% | |
| 0.1862 | 1 | < 0.1% | |
| 0.191 | 1 | < 0.1% | |
| 0.1975 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2.231 | 1 | < 0.1% | |
| 2.1806 | 1 | < 0.1% | |
| 2.1766 | 1 | < 0.1% | |
| 2.1719 | 1 | < 0.1% | |
| 2.1395 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| Date | Time | CO(GT) | PT08.S1(CO) | NMHC(GT) | C6H6(GT) | PT08.S2(NMHC) | NOx(GT) | PT08.S3(NOx) | NO2(GT) | PT08.S4(NO2) | PT08.S5(O3) | T | RH | AH | Unnamed: 15 | Unnamed: 16 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004/3/10 | 18:00:00 | 2.6 | 1360.0 | 150.0 | 11.9 | 1046.0 | 166.0 | 1056.0 | 113.0 | 1692.0 | 1268.0 | 13.6 | 48.9 | 0.7578 | NaN | NaN |
| 1 | 2004/3/10 | 19:00:00 | 2.0 | 1292.0 | 112.0 | 9.4 | 955.0 | 103.0 | 1174.0 | 92.0 | 1559.0 | 972.0 | 13.3 | 47.7 | 0.7255 | NaN | NaN |
| 2 | 2004/3/10 | 20:00:00 | 2.2 | 1402.0 | 88.0 | 9.0 | 939.0 | 131.0 | 1140.0 | 114.0 | 1555.0 | 1074.0 | 11.9 | 54.0 | 0.7502 | NaN | NaN |
| 3 | 2004/3/10 | 21:00:00 | 2.2 | 1376.0 | 80.0 | 9.2 | 948.0 | 172.0 | 1092.0 | 122.0 | 1584.0 | 1203.0 | 11.0 | 60.0 | 0.7867 | NaN | NaN |
| 4 | 2004/3/10 | 22:00:00 | 1.6 | 1272.0 | 51.0 | 6.5 | 836.0 | 131.0 | 1205.0 | 116.0 | 1490.0 | 1110.0 | 11.2 | 59.6 | 0.7888 | NaN | NaN |
| 5 | 2004/3/10 | 23:00:00 | 1.2 | 1197.0 | 38.0 | 4.7 | 750.0 | 89.0 | 1337.0 | 96.0 | 1393.0 | 949.0 | 11.2 | 59.2 | 0.7848 | NaN | NaN |
| 6 | 2004/3/11 | 0:00:00 | 1.2 | 1185.0 | 31.0 | 3.6 | 690.0 | 62.0 | 1462.0 | 77.0 | 1333.0 | 733.0 | 11.3 | 56.8 | 0.7603 | NaN | NaN |
| 7 | 2004/3/11 | 1:00:00 | 1.0 | 1136.0 | 31.0 | 3.3 | 672.0 | 62.0 | 1453.0 | 76.0 | 1333.0 | 730.0 | 10.7 | 60.0 | 0.7702 | NaN | NaN |
| 8 | 2004/3/11 | 2:00:00 | 0.9 | 1094.0 | 24.0 | 2.3 | 609.0 | 45.0 | 1579.0 | 60.0 | 1276.0 | 620.0 | 10.7 | 59.7 | 0.7648 | NaN | NaN |
| 9 | 2004/3/11 | 3:00:00 | 0.6 | 1010.0 | 19.0 | 1.7 | 561.0 | -200.0 | 1705.0 | -200.0 | 1235.0 | 501.0 | 10.3 | 60.2 | 0.7517 | NaN | NaN |
Last rows
| Date | Time | CO(GT) | PT08.S1(CO) | NMHC(GT) | C6H6(GT) | PT08.S2(NMHC) | NOx(GT) | PT08.S3(NOx) | NO2(GT) | PT08.S4(NO2) | PT08.S5(O3) | T | RH | AH | Unnamed: 15 | Unnamed: 16 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9461 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 9462 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 9463 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 9464 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 9465 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 9466 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 9467 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 9468 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 9469 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 9470 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |